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Abstract —In this paper, the resource management problem 
in geographically distributed cloud systems is considered. The 
Follow Me Cloud concept which enables service migration across 
federated data centers (DCs) is adopted. Therefore, there are 
two types of service requests to the DC, i.e., new requests (NRs) 
initiated in the local service area and migration requests (MRs) 
generated when mobile users move across service areas. A novel 
resource management scheme is proposed to help the resonrce 
manager decide whether to accept the service requests (NRs 
or MRs) or not and determine how much resources should 
be allocated to each service (if accepted). The optimization 
objective is to maximize the average system reward and keep 
the rejection probability of service requests under a certain 
threshold. Numerical results indicate that the proposed scheme 
can significantly Improve the overall system utility as well as 
the user experience compared with other resonrce management 
schemes. 

L Introduction 

The booming of bandwidth-intensive mobile applications 
and ever growing mobile user number have brought great 
challenges for today’s mobile cloud systems m. One of the 
performance bottlenecks lies in the fact that most current mo¬ 
bile cloud systems are highly centralized. The fast growing of 
mobile cloud computing business is calling for geographically 
distributed cloud infrastructures, i.e., federated data centers 
(DCs) la, to relieve the heavy load of the central server and 
improve user experience. 

Except the specialized cloud providers like Google, the 
evolution of mobile network architecture has promoted the 
decentralization of cloud systems as well. Toward the fifth gen¬ 
eration (5G) 0 of wireless broadband, emerging paradigms 
such as network function virtualization (NEV) a can help 
to realize a flat and intelligent mobile network embraced with 
cloud technology. Eor example, in a mobile system adopted the 
concept of C-RAN a, scattered cloud resource blocks within 
a certain geographical area can be aggregate into a virtual 
cloud resource pool. Eederated resource pools in multiple 
geographical areas can thus form a distributed cloud system. 

The Eollow Me Cloud (EMC) concept, which enables ser¬ 
vice migration across federated DCs following the mobility 
of mobile terminals (MTs), is widely accepted in distributed 
cloud systems na. Enjoying service from the geographically 
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nearest DC can always get a pleasant user experience since 
a local DC can minimize the end-to-end delay between MTs 
and cloud servers. 

In this paper, a geographically distributed cloud system 
which adopt the EMC concept is considered. Multiple DCs 
take charge of respective service areas and each DC is 
equipped with a resource manager (RM). A MT can initiate a 
new request (NR) to the local DC in its resident service area. 
When the MT wanders to another service area during service 
period, a service migration request (MR) will be send to the 
destination DC. The corresponding RM then makes a trade off 
between the user perceived quality and incurred system cost 
to decide whether the service should be migrated. RMs also 
make decisions on how much resources should be allocated to 
each accepted service request (NR or MR). 

The resource management problem described previously is 
formulated as an constrained semi-Markov Decision Process 
(SMDP). Our work is inspired by 121 and 0. Authors 
in Cl introduce an analytical model for PMC concept and 
in 0, the service migration procedure is modeled using MDP 
Other relevant works on service migration in cloud systems 
include 0, cni, etc. In these previous works, the service 
model is described from the perspective of MTs, and the 
resource allocation problem is not considered. In our work, the 
decision making approach is described from the perspective 
of the overall system, and the objective is to improve the 
overall system utility. To solve the SMDP, the value iteration 
algorithm and Q-learning algorithm are employed to obtain 
the optimal policy. Compared with other schemes, the SMDP- 
based resource management scheme can significantly increase 
the average system reward and reduce the rejection probability 
of service requests. 

The remainder of this paper is organized as follows. The 
system model and service migration procedure is described in 
Section El In Section [Bll the resource management problem 
is formulated as an SMDP. The solution to the problem and 
the numerical results are presented in Section HV] Pinally, the 
conclusion is drawn in Section |Vl 

II. System Model 

We consider a typical 3GPP cellular network covered by 
heterogeneous wireless access nodes, e.g., macro base station 
(BS), femtocell node flm, WLAN access point, etc. Each 
hexagonal cell is equivalent to a service area and be assigned 



(a) Migration request accepted. 



(b) Migration request rejected. 

Fig. 1. Service migration diagram. 

a DC. For simplicity, we assume that each cell is covered by 
a single macro BS which is collocated with a DC. When the 
MT enters the coverage area of a BS, it enters the service 
area of the attached DC equivalently. Each DC has a resource 
pool which contains B units of resources. MTs are offered 
resources (computation or storage) represented in the form of 
virtual machines (VMs). After a service request is accepted, a 
guest VM is constructed in the DC and the RM will allocate 
resources to the VM for running the service. 

When a MT moves from one service area to another during 
service period, a service MR will be sent to the destination DC. 
If the MR is accepted, as shown in Fig. |l(a)| the corresponding 
VM in the original DC will be released and a new VM will be 
constructed in the destination DC. During migration process, 
service data in the original VM needs to be transmitted to 
the new VM via backhaul. Another case is that the MR is 
rejected, as Fig. |l(b)| shows, the VM in the original DC will be 
maintained and the MT will receive service via an additional 
wired link between the serving DC and the current connected 
base station. 

The arrival process of NRs is modeled as a Poisson process 
with the rate of A„. Let pra denote the cross-area movement 
rate of MTs and the service time is assumed to be geometri¬ 
cally distributed with mean 1/(1 — p). Therefore, the arrival 
process of MRs is also poissonian with the rate of 

— An (1 /r) Pm- _ __ (1) 

A six-directional random walk mobility model II12II is used 

to characterize the user movement. When a MT moves be¬ 
tween two adjacent service areas, the previous distance and 
current distance between the MT and its serving DC (defined 



Fig. 2. Transition diagram of service distance, qi = fx, q 2 = 
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by hop count among service areas) are denoted as dp and dc, 
respectively. The transition probability of service distance can 
be obtained by 
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where p™ denotes the average rejection probability of MRs. 
dc = 0 indicates that the MT is receiving service form 
the local DC. The transition diagram of service distance 
is illustrated in Fig. |2] where T denotes the state that the 
service has been finished. D is the maximum allowable 
service distance. When service distance exceeds D, the service 
will be interrupted and this situation is represented by state 
Dr. Based on the transition diagram, we can obtain the 
probability distribution of each service distance value, i.e.. 


= (Pr [dc = 1] , Pr [4 = 2] , • • • , Pr [4 = -D]) G [0,1] 
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III. Problem Formulation 

In this section, the service model described previously is 
formulated as an constrained SMDP From the perspective 
of a service area, the system state s describes the resource 
occupation in the DC and an event occurs in the system, i.e., 

s G e 5, eGE = {A,T,M}, (3) 

where S is the system state space, and are defined as 

• • • ,sg} . (4) 

denotes the number of local MTs (MTs that are currently 
in this service area) which occupies c unit resources of this 
DC. Similarly, denotes the number of remote MTs (MTs 
which are located in other service areas but still receive service 
from this DC) that occupies c unit resources. Total amount of 
occupied resources of this DC is + < B, 

C denotes the maximum amount of resources that can be 
allocated to a single service, e represents an event occurs in 
the system and the event set E is described as follows: 






















































. A = {A", A™}. A” and A™ denote the arrival of a NR 
and a MR to this DC, respectively. 

. T = {T^,T^\c€ {I,-- - ,C'}}. denotes the finish 
and departure of a local service (service to a local MT) 
which occupies c unit resources. Similarly, denotes 
the finish of a remote service (service to a remote MT) 
which occupies c unit resources. 

. M = {M^, M^|c € {1, • • • , C}}. denotes the 
cross-area movement of a local MT that occupies c 
unit resources. Similarly, denotes the cross-area 

movement of a remote MT that occupies c unit resources. 

The occurrence time points of a sequence of events are 
called decision epochs which are indexed by A: G {1, 2, • • ■} in 
chronological order. At each decision epoch, the RM chooses 
an action a from the action space As which is defined as 

r ,C}, ee{A",A-} 

® \ {“l}i otherwise 


0 = 0 indicates that the request is rejected by RM. a = c 
indicates that the request is accepted and c units of resources 
are allocated to this service. In other cases, the RM need not to 
make decisions but update the resource consumption (denoted 
by o = — 1 ) in the system state. 

Based on the system state s and the corresponding action 
o, the system reward function can be evaluated as 

rv{s,a) 

r {s,a) = g {s,a) — / d{s,a)dt, ( 6 ) 
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g (s, a) is the lump sum income the system gains immediately 
after action a is taken, d (s, a) is the cost rate function which 
indicates the per unit time cost during service period and 
y (s, a) is the expected sojourn time of system state until next 
decision epoch. The lump sum income function g{s, a) can be 
expressed as 
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, 0 , otherwise 

Gt and Gm denote the system income for finishing a service 
and accomplishing a service migration, respectively. C; is the 
system loss caused by rejecting an NR and Gm is the overhead 
incurred by service data migration from previous serving DC 
to the migration destination DC. Finally, Gd is the system loss 
due to service interruption. 

We take the end-to-end delay between MT’s connected BS 
and its serving DC as the main QoS factor related to the 
service migration. This part of delay can influence the service 
response time and is proportional to the service distance. 
Another considered factor is the system resource occupation 
cost. Therefore, the d{s,a) can be expanded as 
c 


d(s,a) = (uJdS^d + uJoCrC (s^ + 'sf))- (8) 

C—1 

Gr denotes the one unit resource occupation price per unit 
time, ojd and Wo (ajd+cuo = 1 ) are weighting factors indicating 
the relative importance between service delay and resource 


occupation cost, d denotes the average service distance which 
can be calculated hy d = J2d=i [^c = rf]- 

The state duration function y{s,a) which denotes the expect 
time length between two consecutive decision epochs, is the 
reciprocal of event rate 7 (s, a), i.e., 

C 

1 {s, a) = y{s, a) = A" -f A"* -I- (/i -I- pm) {s^ + s^) ■ 


c=l 

(9) 

The RM chooses actions according to a certain policy which 
is defined as 17 = ((5i (s), 1)2 (s), • • •)■ (s) = a is the action 

decision rule at the k-th decision epoch. In this paper, we 
consider stationary policies only, which remain constant at 
different decision epochs, i.e., 17 = (i5 (s), (5 (s), • • •). Given a 
feasible unichain policy 17, the induced state transition process 
can form a Markov chain with transition probabilities of 
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7(s',5(s))’ 

where s, s' G <S and e' is the event element in state s'. 

The average reward optimality criterion is used in this 
model. Therefore, the optimization objective of the SMDP is 
to achieve the optimal policy satisfying 


max 
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V is the set containing all feasible policies. Eo[*] represents 
the expectation value of quantity * under policy 17. rfc (s, a) 
and yk (s,a) are the reward function and state duration func¬ 
tion in the A:-th time period. and are average rejection 
probabilities of NRs and MRs at the /c-th decision epoch, 
respectively, ujn and ujm are relevant importance factors and 
uJn + ojm = 1. p = -f oj'^p^ is the threshold of 

rejection probability of service requests, where p" and p” are 
maximum allowable rejection probabilities of NRs and MRs, 
respectively. By introducing the Lagrange multiplier jd, the 
above constrained optimization problem can be converted to 
an unconstrained one as 


max ra 
nep 


lim En 
K—^oo 


J_ ^ rk {s, a) 
k=0 Vk (s, o)_ 


-/3 lim En 

K—^oo 


1 ^ / 

:^s(' 




( 12 ) 

The optimal policy which satisfies the above equations can 
be obtained by solving the following Bellman equations ifTSll 
recursively, i.e.. 





















V (s) = max 


rfi (s, a) - 6y {s, a) 
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, (13) 

+ E P^{s'\s,a)V (s') ^ ,Vs G 5. 
s'es ) 

6 and T^(s) are called the average system gain and potential 
function of state s. rp{s,a) is the Lagrange reward function 
which is given by 

rv{s,a) 

r/3 (5, a) = g{s,a) - d (5, a)dt - 13f (5, a ), (14) 

^0 

f{s,a) is the constrain function which is given by 

r u;", e = A",a = 0 

/(s,a)=<^ e = A^,a^0 . (15) 

I 0, otherwise 

If there exists a policy il* satisfying ( fT3l l. it is called the 
optimal policy and we have 8* = Tp*. 9* is the maximum 
average system gain in (fOl l corresponds to the optimal policy 
n*. fp* is the maximum average gain which satishes (fT2l i. 


IV. Problem Solution and Performance 
Evaluation 

A. Solution to the Constrained SMDP 


In this paper, the VIA is used for obtaining the optimal 
policy, before which the SMDP has to be transformed to an 
equivalent discrete-time model as follows. 


(16) 


s s' 
s = s' 


fp {s,a) = rp {s,a)/y (s,a),Vs G S, 

= l 8P^{s'\s,a)/y{s,a), 

’ 1 ^ + v[p^ {s'\s,a) - l]/y{s,a), 

(17) 

All quantities with denote the corresponding ones in the 
transformed model. Then we can employ the VIA lfT3l directly 
with a given value of (3. 



Arrival rate of new service requests 

Fig. 3. Average per unit time reward of the system under different policies. 



Fig. 4. Average resource amount allocated to each service under the optimal 
policy. 

size of the system state space. For a feasible p, there exists a 
/3* satisfying 

[s)f {s,5p* {s)) = p, (21) 


Algorithm 1 The Q-learning VIA 

1. Set /) = /3^, /3^ is an arbitrary number greater than 0. Specify 
e > 0 and set n = 1. 

2. Substitute /3" into ( 1161 as 

fpr, (s,a) = r/jn {s,a)/y (s,a),Vs G S (18) 

Solve for policy fl'' in with a reward function given by 
via VIA. 

3. Calculate the system steady state distribution under policy 
fl"'. Then calculate 

(s) / (s, Spn (s)). (19) 

s^S 

4. Let A" — p — , if A** < e when n > 2, go to step 5. 

Otherwise update 6" by 

/3"+i + -A", (20) 

n 

where a is the step size which can be revised during the iteration 
process. Then go to step 2. 

5. Take O" as the optimal policy and stop. 


The system state transition matrix under policy O is denoted 
as (s'|s,a) |s, s' G 5) G where M is the 


s^S 

= 5p- (s) denotes the policy obtained by solving (fTSl) 
with P*. Assume that (3~ is smaller than P* and is larger 
than /?*, then yve have 

= ^pJ~ (s)/ is,Sp- (s)) > p, (22) 

= '^pj* {s)f (s, dp+ (s)) < p. (23) 

Cl*p_ = 5p- (s) and 0^+ = 5p+ (s) are policies obtained by 
solving (fOl) with /3“ and /3+, respectively. 

When solve (fOl) . we set as a arbitrary positive value and 
employ the VIA to get a temporary optimal policy Then 
the expected rejection probability can be calculated with 
(I2TI 1. If ^ p, then we adjust the value of /3 according to 
(I22I 1 and (l2Tt . Thus repeatedly, the value of f3 can converges 
to P* with arbitrarily small error. This approach is referred 
to as the Q-learning VIA and the detailed flow is shown in 
Algorithm [I] 

B. Numerical Results and Analysis 

In this subsection, the performance of the SMDP-based 
resource management scheme is evaluated. The optimal pol¬ 
icy fl* obtained by employing the Q-learning VIA is com¬ 
pared with four reference baselines. Baseline 1 refers to a 













(b) Average rejection probability of MRs. 

Fig. 5. Average rejection probability of service requests under different 
policies. 

short-sighted policy obtained by employing the greedy algo¬ 
rithm IH (called Greedy policy). Baseline 2 and 3 are two 
simple and straight allocation methods, one of which allocates 
all available resources in the DC to each service request (called 
AU policy) and another allocates a hxed amount of resources 
for all service requests (called Fixed policy). The last baseline 
refers to the resource reservation scheme proposed in ina 
(referred to as the R-RSV policy) which reserves a small 
portion of the resources only for MRs. 

The average per unit time reward of the system under dif¬ 
ferent policies is presented first. As Fig. |3] shows, the SMDP- 
based policy outperforms other baselines on average system 
reward, especially when the service requests are intensively 
arriving. The R-RSV policy achieves a hne performance as 
well for it reserves some resources for migrated services, 
thus the rejection probability of MRs can be reduced. Fig. 
m illustrates the average resource amount the RM allocates to 
each service request when the SMDP-based resource manage¬ 
ment scheme is adopted. We can see that the average resource 
amount allocated to an NR is higher than the average resource 
amount allocated to a MR. With the increase of NR arrival rate, 
the average resource amount allocated to each service request 
decreases to ensure that the DC can serve more MTs. 

The rejection probabilities of NRs and MRs under different 
policies are illustrated in Fig. |5] It can be seen that the SMDP- 


based policy can significantly reduce the rejection probabilities 
of both NRs and MRs. By adjusting the value of threshold p, 
the rejection probability of service requests can be controlled 
within a certain scope. Therefore, we can conclude that for 
distributed cloud systems which support service migration, 
the proposed resource management scheme can signihcantly 
improve the overall system utility as well as the user perceived 
quality. 

V. Conclusion 

In this paper, the resource management problem in geo¬ 
graphically distributed cloud systems which adopt the FMC 
concept is considered. An SMDP-based admission control and 
resource allocation scheme is proposed to help RMs make 
decisions on whether to accept the service requests (NRs or 
MRs) or not and determine the amount of resources allocated 
to each accepted service. The optimization objective is to 
maximize the average system reward and keep the rejection 
probability of service requests under a given threshold. To 
determine the value of the Lagrange multiplier, the Q-learning 
algorithm is used and then the VIA is employed to obtain the 
optimal policy. Numerical results indicate that the proposed 
resource management scheme can improve the system reward 
and reduce the rejection probability of service requests mean¬ 
while. 
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